Genome-wide enhancer prediction from epigenetic signatures using genetic algorithm-optimized support vector machines
نویسندگان
چکیده
The chemical modification of histones at specific DNA regulatory elements is linked to the activation, inactivation and poising of genes. A number of tools exist to predict enhancers from chromatin modification maps, but their practical application is limited because they either (i) consider a smaller number of marks than those necessary to define the various enhancer classes or (ii) work with an excessive number of marks, which is experimentally unviable. We have developed a method for chromatin state detection using support vector machines in combination with genetic algorithm optimization, called ChromaGenSVM. ChromaGenSVM selects optimum combinations of specific histone epigenetic marks to predict enhancers. In an independent test, ChromaGenSVM recovered 88% of the experimentally supported enhancers in the pilot ENCODE region of interferon gamma-treated HeLa cells. Furthermore, ChromaGenSVM successfully combined the profiles of only five distinct methylation and acetylation marks from ChIP-seq libraries done in human CD4(+) T cells to predict ∼21,000 experimentally supported enhancers within 1.0 kb regions and with a precision of ∼90%, thereby improving previous predictions on the same dataset by 21%. The combined results indicate that ChromaGenSVM comfortably outperforms previously published methods and that enhancers are best predicted by specific combinations of histone methylation and acetylation marks.
منابع مشابه
Prediction of soil cation exchange capacity using support vector regression optimized by genetic algorithm and adaptive network-based fuzzy inference system
Soil cation exchange capacity (CEC) is a parameter that represents soil fertility. Being difficult to measure, pedotransfer functions (PTFs) can be routinely applied for prediction of CEC by soil physicochemical properties that can be easily measured. This study developed the support vector regression (SVR) combined with genetic algorithm (GA) together with the adaptive network-based fuzzy infe...
متن کاملPredicting cardiac arrhythmia on ECG signal using an ensemble of optimal multicore support vector machines
The use of artificial intelligence in the process of diagnosing heart disease has been considered by researchers for many years. In this paper, an efficient method for selecting appropriate features extracted from electrocardiogram (ECG) signals, based on a genetic algorithm for use in an ensemble multi-kernel support vector machine classifiers, each of which is based on an optimized genetic al...
متن کاملDELTA: A Distal Enhancer Locating Tool Based on AdaBoost Algorithm and Shape Features of Chromatin Modifications
Accurate identification of DNA regulatory elements becomes an urgent need in the post-genomic era. Recent genome-wide chromatin states mapping efforts revealed that DNA elements are associated with characteristic chromatin modification signatures, based on which several approaches have been developed to predict transcriptional enhancers. However, their practical application is limited by incomp...
متن کاملA Comparative Study of Extreme Learning Machines and Support Vector Machines in Prediction of Sediment Transport in Open Channels
The limiting velocity in open channels to prevent long-term sedimentation is predicted in this paper using a powerful soft computing technique known as Extreme Learning Machines (ELM). The ELM is a single Layer Feed-forward Neural Network (SLFNN) with a high level of training speed. The dimensionless parameter of limiting velocity which is known as the densimetric Froude number (Fr) is predicte...
متن کاملApplication of Genetic Algorithm Based Support Vector Machine Model in Second Virial Coefficient Prediction of Pure Compounds
In this work, a Genetic Algorithm boosted Least Square Support Vector Machine model by a set of linear equations instead of a quadratic program, which is improved version of Support Vector Machine model, was used for estimation of 98 pure compounds second virial coefficient. Compounds were classified to the different groups. Finest parameters were obtained by Genetic Algorithm method ...
متن کامل